Article Outline
情景描述:
上周,由于产品嫌报告生成太慢,经过使用
profile/gprof2dot
研究后,发现主要时间耗费在接口网络请求上,于是我决定在项目中大量处理I/O网络请求的地方使用gevent
,以缓解报告生成压力。
<!--more-->
实现代码:
import gevent
from gevent import monkey, pool
monkey.patch_socket()
p = pool.Pool(300)
def requests_parse(self, tel_tuple):
"""
主要处理requests请求
"""
print('Size of pool', len(p))
……
……
def generator_label(self, tel_data):
"""
使用gevent实现协程处理I/O网络请求
"""
jobs = [p.spawn(self.requests_parse, tel) for tel in tel_data]
gevent.joinall(jobs)
tlist = [x.value for x in jobs]
if None in tlist:
message_list = [x.exception for x in jobs]
self.logger.error(tlist)
self.logger.error(message_list)
raise Exception(message_list)
return tlist
def update_data_step(self, tel):
"""
主要将请求回来处理好的数据写入数据库(mongo)
"""
res = self.db.update()
……
……
tel_data = [……] # 一大堆待请求参数list
tlist = self.generator_label(set(tel_data))
map(lambda x: self.update_data_step(x[0])(x[1],x[2],x[3],x[4],x[5],x[6]), tlist)
问题:
上线后发现,代码运行一段时间后,请求一上来,任务数直线上升,一直增加: 但是,pool的数量是正常的: 之后log里报错信息:
File "run.py", line 42, in update_data
File "report/generate/calls_sa_by_tel.py", line 396, in run
File "report/generate/calls_sa_by_tel.py", line 38, in base_call
File "venv/lib/python2.7/site-packages/pymongo/cursor.py", line 729, in count
File "venv/lib/python2.7/site-packages/pymongo/collection.py", line 1344, in _count
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
File "venv/lib/python2.7/site-packages/pymongo/mongo_client.py", line 904, in _socket_for_reads
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
File "venv/lib/python2.7/site-packages/pymongo/mongo_client.py", line 870, in _get_socket
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
File "venv/lib/python2.7/site-packages/pymongo/server.py", line 168, in get_socket
File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
File "venv/lib/python2.7/site-packages/pymongo/pool.py", line 844, in get_socket
File "venv/lib/python2.7/site-packages/pymongo/pool.py", line 881, in _get_socket_no_auth
File "venv/lib/python2.7/site-packages/pymongo/pool.py", line 817, in connect
File "venv/lib/python2.7/site-packages/pymongo/pool.py", line 263, in _raise_connection_failure
AutoReconnect: xxx.xxx.xxx.117:27017: [Errno 24] Too many open files
解决办法:
经过分析,解决办法:
monkey.patch_socket()
换为monkey.patch_all()
,或者,在使用完gevent
后使用reload(socket)
将socket初始化。 原因:应该是mongo写数据是阻塞的,请求快于写操作,导致写操作堆积越来越多,最终导致程序抛出Too many open files
错误。 最终代码,如下:
import gevent
from gevent import monkey, pool
monkey.patch_all()
def generator_label(self, tel_data):
p = pool.Pool(10)
jobs = [p.spawn(self.requests_parse, tel) for tel in tel_data]
gevent.joinall(jobs)
# print [x.value for x in jobs]
tlist = [x.value for x in jobs]
if None in tlist:
message_list = [x.exception for x in jobs]
self.logger.error(tlist)
self.logger.error(message_list)
raise Exception(message_list)
return tlist
# 或者使用
p = pool.Pool(10)
jobs = [p.spawn(self.requests_parse, tel) for tel in tel_data]
gevent.joinall(jobs)
import socket
reload(socket)
另一个用例:
给传递两个参数,直接后面跟着就行,逗号分开; 返回如是多个情况的,value是一个以tuple为元素的list。
p = pool.Pool(10) jobs = []
# date ('2017-07-01', '2017-07-14')
jobs = [p.spawn(self.crawl_a_call_log, date[0], date[1]) for date in dates]
gevent.joinall(jobs)
data_list = [x.value for x in jobs]
print data_list
if None in data_list:
message_list = [x.exception for x in jobs]
self.log('crawler', data_list, message_list)
raise Exception(message_list)
```
总结
协程(gevent)是把双刃剑,monkey.patch 是一个邪恶的东西; 提升效果不要太好,耗时足足降了60%,而且,请求越多,效果越明显。